Navigable telepresence method and system utilizing an array of cameras

ABSTRACT

Methods and systems permit one or more users to navigate through imagery of an environment. The system may include a first user interface device having first user inputs associated with first movement through the environment and a second user interface device having second user inputs associated with a second movement through the environment. Thus, a first user and a second user are able to navigate simultaneously and independently. In certain embodiments the system processes imagery of the environment to smooth user navigation through the environment.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of and claims priority under 35U.S.C. §120 to commonly assigned pending U.S. patent application Ser.No. 13/949,132, filed Jul. 23, 2013, which is a continuation of U.S.patent application Ser. No. 12/610,188, filed on Oct. 30, 2009, which isa continuation of U.S. patent application Ser. No. 11/359,233, filedFeb. 21, 2006, which issued as U.S. Pat. No. 7,613,999, which is acontinuation of U.S. patent application Ser. No. 10/308,230, filed onDec. 2, 2002, which is a continuation of U.S. patent application Ser.No. 09/419,274, filed on Oct. 15, 1999, which issued as U.S. Pat. No.6,522,325, which is a continuation-in-part of U.S. patent applicationSer. No. 09/283,413, filed on Apr. 1, 1999, which issued as U.S. Pat.No. 6,535,226, which claims the benefit of priority to U.S. ProvisionalApplication Ser. No. 60/080,413, filed on Apr. 2, 1998, all of which arehereby incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to telepresence systems andmethods.

2. Description of Related Art

In general, a need exists for the development of telepresence systemssuitable for use with static venues, such as museums, and dynamic venuesor events, such as a music concerts. The viewing of such venues islimited by time, geographical location, and the viewer capacity of thevenue. For example, potential visitors to a museum may be prevented fromviewing an exhibit due to the limited hours the museum is open.Similarly, music concert producers must turn back fans due to thelimited seating of an arena. In short, limited access to venues reducesthe revenue generated.

In an attempt to increase the revenue stream from both static anddynamic venues, such venues have been recorded for broadcast ordistribution. In some instances, dynamic venues are also broadcast live.While such broadcasting increases access to the venues, it involvesconsiderable production effort. Typically, recorded broadcasts must becut and edited, as views from multiple cameras are pieced together.These editorial and production efforts are costly.

In some instances, the broadcast resulting from these editorial andproduction efforts provides viewers with limited enjoyment.Specifically, the broadcast is typically based on filming the venue froma finite number of predetermined cameras. Thus, the broadcast containslimited viewing angles and perspectives of the venue. Moreover, theviewing angles and perspectives presented in the broadcast are thoseselected by a producer or director during the editorial and productionprocess; there is no viewer autonomy. Furthermore, although thebroadcast is often recorded for multiple viewings, the broadcast haslimited content life because each viewing is identical to the first.Because each showing looks and sounds the same, viewers rarely come backfor multiple viewings.

A viewer fortunate enough to attend a venue in person will encountermany of the same problems. For example, a museum-goer must remain behindthe barricades, viewing exhibits from limited angles and perspectives.Similarly, concert-goers are often restricted to a particular seat orsection in an arena. Even if a viewer were allowed free access to theentire arena to videotape the venue, such a recording would also havelimited content life because each viewing would be the same as thefirst. Therefore, a need exists for a telepresence system thatpreferably provides user autonomy while resulting in recordings withenhanced content life at a reduced production cost.

Apparently, attempts have been made to develop telepresence systems tosatisfy some of the foregoing needs. One telepresence system isdescribed in U.S. Pat. No. 5,708,469 for Multiple View TelepresenceCamera Systems Using A Wire Cage Which Surrounds A Polarity Of MultipleCameras And Identifies The Fields Of View, issued Jan. 13, 1998. Thesystem disclosed therein includes a plurality of cameras, wherein eachcamera has a field of view that is space-contiguous with and at a rightangle to at least one other camera. In other words, it is preferablethat the camera fields of view do not overlap each other. A userinterface allows the user to jump between views. In order for the user'sview to move through the venue or environment, a moving vehicle carriesthe cameras.

This system, however, has several drawbacks. For example, in order for aviewer's perspective to move through the venue, the moving vehicle mustbe actuated and controlled by the viewer. In this regard, operation ofthe system is complicated. Furthermore, because the camera views arecontiguous, typically at right angles, changing camera views results ina discontinuous image.

Other attempts at providing a telepresence system have taken the form ofa 360 degree camera systems. One such system is described in U.S. Pat.No. 5,745,305 for Panoramic Viewing Apparatus, issued Apr. 28, 1998. Thesystem described therein provides a 360 degree view of environment byarranging multiple cameras around a pyramid shaped reflective element.Each camera, all of which share a common virtual optical center,receives an image from a different side of the reflective pyramid. Othertypes of 360 degree camera systems employ a parabolic lens or a rotatingcamera.

Such 360 degree camera systems also suffer from drawbacks. Inparticular, such systems limit the user's view to 360 degrees from agiven point perspective. In other words, 360 degree camera systemsprovide the user with a panoramic view from a single location. Only ifthe camera system was mounted on a moving vehicle remotely controlled bythe viewer could the viewer navigate and experience simulated movementthrough an environment.

U.S. Pat. No. 5,187,571 for Television System For Displaying MultipleViews of A Remote Location issued Feb. 16, 1993, describes a camerasystem similar to the 360 degree camera systems described above. Thesystem described provides a user to select an arbitrary and continuouslyvariable section of an aggregate field of view. Multiple cameras arealigned so that each camera's field of view merges contiguously withthose of adjacent cameras thereby creating the aggregate field of view.The aggregate field of view may expand to cover 360 degrees. In order tocreate the aggregate field of view, the cameras' views must becontiguous. In order for the camera views to be contiguous, the camerashave to share a common point perspective, or vertex. Thus, like thepreviously described 360 degree camera systems, the system of U.S. Pat.No. 5,187,571 limits a user's view to a single point perspective, ratherthan allowing a user to experience movement in perspective through anenvironment.

Also, with regard to the system of U.S. Pat. No. 5,187,571, in order toachieve the continuity between camera views, a relatively complexarrangement of mirrors is required. Additionally, each camera seeminglymust also be placed in the same vertical plane.

Thus, a need still exists for an improved telepresence system thatprovides the ability to better simulate a viewer's actual presence in avenue, preferably in real time.

3. Summary of Embodiments of the Invention

These and other needs are satisfied by embodiments of the presentinvention. A telepresence method and system according to one embodimentof the present invention permits one or more users to navigate throughimagery of an environment. One such system receives, from a first userinterface device associated with the first user, first user inputsassociated with the first view through the environment, and receives,from a second user interface device associated with the second user,second user inputs associated with the second view through theenvironment. The system receives electronic imagery of progressivelydifferent perspectives of the environment having overlapping fields ofview and generates electronic mosaic imagery from the electronic imageryof the environment. Based on the first user inputs, the system providesto the first user interface device mosaic imagery along the first view,thereby allowing the first user to navigate along the first view of theenvironment, and based on the second user inputs, provides to the seconduser interface device mosaic imagery along the second view, therebyallowing the first user and second user to navigate simultaneously andindependently along the first view and second view of the environment,respectively. In certain embodiments the system processes the imagery tosmooth user navigation through the environment.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an overall schematic of one embodiment of the presentinvention.

FIG. 2a is a perspective view of a camera and a camera rail section ofthe array according to one embodiment of the present invention.

FIGS. 2b-2d are side plan views of a camera and a camera rail accordingto one embodiment of the present invention.

FIG. 2e is a top plan view of a camera rail according to one embodimentof the present invention.

FIG. 3 is a perspective view of a portion of the camera array accordingto one embodiment of the present invention.

FIG. 4 is a perspective view of a portion of the camera array accordingto an alternate embodiment of the present invention.

FIG. 5 is a flowchart illustrating the general operation of the userinterface according to one embodiment of the present invention.

FIG. 6 is a flowchart illustrating in detail a portion of the operationshown in FIG. 5.

FIG. 7a is a perspective view of a portion of one embodiment of thepresent invention illustrating the arrangement of the camera arrayrelative to objects being viewed.

FIGS. 7b-7g illustrate views from the perspectives of selected camerasof the array in FIG. 7 a.

FIG. 8 is a schematic view of an alternate embodiment of the presentinvention.

FIG. 9 is a schematic view of a server according to one embodiment ofthe present invention.

FIG. 10 is a schematic view of a server according to an alternateembodiment of the present invention.

FIG. 11 is a top plan view of an alternate embodiment of the presentinvention.

FIG. 12 is a flowchart illustrating in detail the image capture portionof the operation of the embodiment shown in FIG. 11.

DESCRIPTION OF CERTAIN EMBODIMENTS 1. General Description of PreferredEmbodiments

The present invention relates to a telepresence system that, in apreferred embodiments, uses modular, interlocking arrays ofmicrocameras. The cameras are on rails, with each rail holding aplurality of cameras. These cameras, each locked in a fixed relation toevery adjacent camera on the array and dispersed dimensionally in agiven environment, transmit image output to an associated storage node,thereby enabling remote viewers to navigate through such environmentwith the same moving light reflections and shadows) that characterize anactual in-environment transit.

In another preferred embodiment, the outputs of these microcameras arelinked by tiny (less than half the width of a human hair) VerticalCavity Surface Emitting Lasers (VCSELs), or alternatively any photonicor optoelectric throughput device, to optical fibers, fed through areanet hubs, buffered on server arrays or server farms (either forrecording or (instantaneous) relay) and sent to viewers at remoteterminals, interactive wall screens, or mobile image appliances (likeVirtual Retinal Displays). Each remote viewer, through an intuitivegraphical user interface (GUI), can navigate effortlessly through theenvironment, enabling seamless movement through the event.

This involves a multiplexed, electronic, photonic, optoelectronic, orany data throughput-configured switching process (invisible to theviewer) which moves the viewer's point perspective from camera tocamera. Rather than relying, per se, on physically moving a microcamerathrough space (i.e., vesting a viewer with control of a vehicle carryingone or more cameras), the system uses the multiplicity of positionedmicrocameras to move the viewer's perspective from microcamera node toadjacent microcamera node in a way that provides the viewer with asequential visual and acoustical path throughout the extent of thearray. This allows the viewer to fluidly track or dolly through a3-dimensional remote environment, to move through an event and makeautonomous real-time decisions about where to move and when to linger.

Instead of vesting the viewer with the capacity to physically move arobotic camera or vehicle on which the camera or cameras are mounted,which would immediately limit the number of viewers that couldsimultaneously control their own course, one or more viewers cannavigate via storage nodes containing images of an environmentassociated with a pre-existing array of cameras. The user can movearound the environment in any direction—clockwise or counterclockwise,up, down, closer to or further away from the environment, or somecombination thereof. Moreover, image output mixing, such as mosaicingand tweening, effectuates seamless motion throughout the environment.

2. Detailed Description of Preferred Embodiments

Certain embodiments of the present invention will now be described ingreater detail with reference to the drawings. It is understood that theoperation and functionality of many of the components of the embodimentsdescribed herein are known to one skilled in the art and, as such, thepresent description does not go into detail into such operative andfunctionality.

A telepresence system 100 according to the present invention is shown inFIG. 1. The telepresence system 100 generally includes an array 10 ofcameras 14 coupled to a server 18, which in turn is coupled to one ormore users 22 each having a user interfaced/display device 24. As willbe understood to one skilled it the art, the operation and functionalityof the embodiment described herein is provided, in part, by the serverand user interface/display device. While the operation of thesecomponents is not described by way of particular code listings or logicdiagrams, it is to be understood that one skilled in the art will beable to arrive at suitable implementations based on the functional andoperational details provided herein. Furthermore, the scope of thepresent invention is not to be construed as limited to any particularcode or logic implementation.

In the present embodiment, the camera array 10 is conceptualized asbeing in an X, Z coordinate system. This allows each camera to have anassociated, unique node address comprising an X, and Z coordinate (X,Z). In the present embodiment, for example, a coordinate valuecorresponding to an axis of a particular camera represents the number ofcamera positions along that axis the particular camera is displaced froma reference camera. In the present embodiment, from the user'sperspective the X axis runs left and right, and the Z axis runs down andup. Each camera 14 is identified by its X, Z coordinate. It is to beunderstood, however, that other methods of identifying cameras 14 can beused. For example, other coordinate systems, such as those notingangular displacement from a fixed reference point as well as coordinatesystems that indicate relative displacement from the current camera nodemay be used. In another alternate embodiment, the array is threedimensional, located in an X, Y, Z coordinate system.

The array 10 comprises a plurality of rails 12, each rail 12 including aseries of cameras 14. In the present preferred embodiment, the cameras14 are microcameras. The output from the microcameras 14 are coupled tothe server 18 by means of local area hubs 16. The local area hubs 16gather the outputs and, when necessary, amplify the outputs fortransmission to the server 18. In an alternate embodiment, the localarea hubs 16 multiplex the outputs for transmission to the server 18.Although the figure depicts the communication links 15 between thecamera 14 and the server 18 as being hardwired, it is to be understoodthat wireless links may be employed. Thus, it is within the scope of thepresent invention for the communication links 15 to take the form offiber optics, cable, satellite, microwave transmission, internet, andthe like.

Also coupled to the server 18 is an electronic storage device 20. Theserver 18 transfers the outputs to the electronic storage device 20. Theelectronic (mass) storage device 20, in turn, transfers each camera'soutput onto a storage medium or means, such as CD-ROM, DVD, tape,platter, disk array, or the like. The output of each camera 14 is storedin a particular location on the storage medium associated with thatcamera 14 or is stored with an indication to which camera 14 each storedoutput corresponds. For example, the output of each camera 14 is storedin contiguous locations on a separate disk, tape, CD-ROM, or platter. Asis known in the art, the camera output may be stored in a compressedformat, such as JPEG, MPEG1, MPEG2, and the like. Having stored eachoutput allows a user to later view the environment over and over again,each time moving through the array 10 in a new path, as described below.In some embodiments of the present invention, such as those providingonly real-time viewing, no storage device is required.

As will be described in detail below, the server 18 receives output fromthe cameras 14 in the array. The server 18 processes these outputs foreither storage in the electronic storage device 20, transmission to theusers 22 or both.

It is to be understood that although the server 18 is configured toprovide the functionality of the system 100 in the present embodiment,it is to be understood that other processing elements may provide thefunctionality of the system 100. For example, in alternate embodiments,the user interface device is a personal computer programmed to interpretthe user input and transmit an indication of the desired current nodeaddress, buffer outputs from the array, and provide other of thedescribed functions.

As shown, the system 100 can accommodate (but does not require) multipleusers 22. Each user 22 has associated therewith a user interface deviceincluding a user display device (collectively 24). For example, user22-1 has an associated user interface device and a user display devicein the form of a computer 24-1 having a monitor and a keyboard. User22-2 has associated therewith an interactive wall screen 24-2 whichserves as a user interface device and a user display device. The userinterface device and the user display device of user 22-3 includes amobile audio and image appliance 24-3. A digital interactive TV 24-4 isthe user interface device and user display device of user 22-4.Similarly, user 22-5 has a voice recognition unit and monitor 24-5 asthe user interface and display devices. It is to be understood that theforegoing user interface devices and user display devices are merelyexemplary; for example, other interface devices include a mouse, touchscreen, biofeedback devices, as well as those identified in U.S.Provisional Patent Application Ser. No. 60/080,413 and the like.

As described in detail below, each user interface device 24 hasassociated therewith user inputs. These user inputs allow each user 22to move or navigate independently through the array 10. In other words,each user 22 enters inputs to generally select which camera outputs aretransferred to the user display device. Preferably, each user displaydevice includes a graphical representation of the array 10. Thegraphical representation includes an indication of which camera in thearray the output of which is being viewed. The user inputs allow eachuser to not only select particular cameras, but also to select relativemovement or navigational paths through the array 10.

As shown in FIG. 1, each user 22 may be coupled to the server 18 by anindependent communication link. Furthermore, each communication link mayemploy different technology. For example, in alternate embodiments, thecommunication links include an internet link, a microwave signal link, asatellite link, a cable link, a fiber optic link, a wireless link, andthe like.

It is to be understood that the array 10 provides several advantages.For example, because the array 10 employs a series of cameras 14, noindividual camera, or the entire array 10 for that matter, need be movedin order to obtain a seamless view of the environment. Instead, the usernavigates through the array 10, which is strategically placed throughand around the physical environment to be viewed. Furthermore, becausethe cameras 14 of the array 10 are physically located at differentpoints in the environment to be viewed, a user is able to view changesin perspective, a feature unavailable to a single camera that merelychanges focal length.

Microcameras

Each camera 14 is preferably a microcamera. The microcameras—microlensesmounted on thumbnail-sized CMOS active pixel sensor (APS) microchips—arearranged in patterns that enable viewers to move radically, in straightlines, or in fluid combinations thereof. The cameras are produced in amainstream manufacturing process, by several companies, includingPhotobit, Pasadena, Calif.; Sarnoff Corporation, Princeton, N.J.; andVLSI Vision, Ltd., Edinburgh, Scotland.

Structure of the Array

The structure of the array 10 will now be described in greater detailwith reference to FIGS. 2a-2e . In general, the camera array 10 of thepresent embodiment comprises a series of modular rails 12 carryingmicrocameras 14. The structure of the rails 12 and cameras 14 will nowbe discussed in greater detail with reference to FIGS. 2a through 2d .Each camera 14 includes registration pins 34. In the preferredembodiment, the cameras 14 utilize VCSELs to transfer their outputs tothe rail 12. It is to be understood that the present invention is notlimited to any particular type of camera 14, however, or even to anarray 10 consisting of only one type of camera 14.

Each rail 12 includes two sides, 12 a, 12 b, at least one of which 12 bis hingeably connected to the base 12 c of the rail 12. The base 12 cincludes docking ports 36 for receiving the registration pins 34 of thecamera 14. When the camera 14 is seated on a rail 12 such that theregistration pins 34 are fully engaged in the docking ports 36, thehinged side 12 b of the rail 12 is moved against the base 32 of thecamera 14, thereby securing the camera 14 to the rail 12.

Each rail 12 further includes a first end 38 and a second end 44. Thefirst end 38 includes, in the present embodiment, two locking pins 40and a protected transmission relay port 42 for transmitting the cameraoutputs. The second end 44 includes two guide holes 46 for receiving thelocking pins 40, and a transmission receiving port 48. Thus, the firstend 38 of one rail 12 is engagable with a second end 44 of another rail12. Therefore, each rail 12 is modular and can be functionally connectedto another rail to create the array 10.

Once the camera 14 is securely seated to the rail 12, the camera 14 ispositioned such that the camera output may be transmitted via the VCSELto the rail 12. Each rail 12 includes communication paths fortransmitting the output from each camera 14.

Although the array 10 is shown having a particular configuration, it isto be understood that virtually any configuration of rails 12 andcameras 14 is within the scope of the present invention. For example,the array 10 may be a linear array of cameras 14, a 2-dimensional arrayof cameras 14, a 3-dimensional array of cameras 14, or any combinationthereof. Furthermore, the array 10 need not be comprised solely oflinear segments, but rather may include curvilinear sections.

The array 10 is supported by any of a number of support means. Forexample, the array 10 can be fixedly mounted to a wall or ceiling; thearray 10 can be secured to a moveable frame that can be wheeled intoposition in the environment or supported from cables.

FIG. 3 illustrates an example of a portion of the array 10. As shown,the array 10 comprises five rows of rails 12 a, through 12 e. Each ofthese rails 12 a-12 e is directed towards a central plane, whichsubstantially passes through the center row 12 c. Consequently, for anyobject placed in the same plane as the middle row 12 c, a user would beable to view the object essentially from the bottom, front, and top.

As noted above, the rails 12 of the array 10 need not have the samegeometry. For example, some of the rails 12 may be straight while othersmay be curved. For example, FIG. 4 illustrates the camera alignment thatresults from utilizing curved rails. It should be noted that rails inFIG. 4 have been made transparent so that the arrangement of cameras 14may be easily seen.

In an alternate embodiment, each rail is configured in a step-likefashion or an arc with each camera above (or below) and in front of aprevious camera. In such an arrangement, the user has the option ofmoving forward through the environment.

It is to be understood that the spacing of the microcameras 14 dependson the particular application, including the objects being viewed, thefocal length of the microcameras 14, and the speed of movement throughthe array 10. In one embodiment the distance between microcameras 14 canbe approximated by analogy to a conventional movie reel recordingprojector. In general, the speed of movement of a projector through anenvironment divided by the frames per unit of time second results in aframe-distance ratio.

For example, as shown by the following equations, in some applications aframe is taken ever inch. A conventional movie projector recordstwenty-four frames per second. When such a projector is moved through anenvironment at two feet per second, a frame is taken approximately everyinch.

2 ft÷24 frames=2 ft=1 ft=12 inches=

1 inch=1 frame per inch.

1 frame

A frame of the projector is analogous to a camera 14 in the presentinvention. Thus, where one frame per inch results in a movie having aseamless view of the environment, so too does one camera 14 per inch.Thus, in one embodiment of the present invention the cameras 14 arespaced approximately one inch apart, thereby resulting in a seamlessview of the environment.

Navigation Through the System

The general operation of the present embodiment will now be describedwith reference to FIG. 5 and continuing reference to FIG. 1. As shown instep 110, the user is presented with a predetermined starting view ofthe environment corresponding to a starting camera. It is to beunderstood that the operation of the system is controlled, in part, bysoftware residing in the server. As noted above, the system associateseach camera in the array with a coordinate. Thus, the system is able tonote the coordinates of the starting camera node. The camera output and,thus the corresponding view, changes only upon receiving a user input.

When the user determines that they want to move or navigate through thearray, the user enters a user input through the user interface device24. As described below, the user inputs of the present embodimentgenerally include moving to the right, to the left, up, or down in thearray. Additionally, a user may jump to a particular camera in thearray. In alternate embodiments, a subset of these or other inputs, suchas forward, backward, diagonal, over, and under, are used. The userinterface device, in turn, transmits the user input to the server instep 120.

Next, the server receives the user input in step 130 and proceeds todecode the input. In the present embodiment, decoding the inputgenerally involves determining whether the user wishes to move to theright, to the left, up, or down in the array.

On the other hand, if the received user input does not correspond tobackward, then the server 18 proceeds to determine whether the inputcorresponds to moving to the user's right in the array 10. Thisdetermination is shown in step 140. If the received user input doescorrespond to moving to the right, the current node address isincremented along the X axis in step 150 to obtain an updated address.

If the received user input does not correspond to moving to the right inthe array, the server 18 then determines whether the input correspondsto moving to the user's left in the array 10 in step 160. Upondetermining that the input does correspond to moving to the left, theserver 18 then decrements the current node address along the X axis toarrive at the updated address. This is shown in step 170.

If the received user input does not correspond to either moving to theright or to the left, the server 18 then determines whether the inputcorresponds to moving up in the array. This determination is made instep 180. If the user input corresponds to moving up, in step 190, theserver 18 increments the current node address along the Z axis, therebyobtaining an updated address.

Next, the server 18 determines whether the received user inputcorresponds to moving down in the array 10. This determination is madein step 200. If the input does correspond to moving down in the array10, in step 210 the server 18 decrements the current node address alongthe Z axis.

Lastly, in step 220 the server 18 determines whether the received userinput corresponds to jumping or changing the view to a particular camera14. As indicated in FIG. 5, if the input corresponds to jumping to aparticular camera 14, the server 18 changes the current node address toreflect the desired camera position. Updating the node address is shownas step 230. In an alternate embodiment, the input corresponds tojumping to a particular position in the array 10, not identified by theuser as being a particular camera but by some reference to the venue,such as stage right.

It is to be understood that the server 18 may decode the received userinputs in any of a number of ways, including in any order. For example,in an alternate embodiment the server 18 first determines whether theuser input corresponds to up or down. In another alternate, preferredembodiment, user navigation includes moving forward, backward, to theleft and right, and up and down through a three dimensional array.

If the received user input does not correspond to any of the recognizedinputs, namely to the right, to the left, up, down, or jumping to aparticular position in the array 10 then in step 240, the server 18causes a message signal to be transmitted to the user display device 24,causing a message to be displayed to the user 22 that the received inputwas not understood. Operation of the system 100 then continues with step120, and the server 18 awaits receipt of the next user input.

After adjusting the current node address, either by incrementing ordecrementing the node address along an axis or by jumping to aparticular node address, the server 18 proceeds in step 250 to adjustthe user's view. Once the view is adjusted, operation of the system 100continues again with step 120 as the server 18 awaits receipt of thenext user input.

In an alternate embodiment, the server 18 continues to update the nodeaddress and adjust the view based on the received user input. Forexample, if the user input corresponded to “moving to the right”, thenoperation of the system 100 would continuously loop through steps 140,150, and 250, checking for a different input. When the different inputis received, the server 18 continuously updates the view accordingly.

It is to be understood that the foregoing user inputs, namely, to theright, to the left, up, and down, are merely general descriptions ofmovement through the array. Although the present invention is not solimited, in the present preferred embodiment, movement in each of thesegeneral directions is further defined based upon the user input.

Accordingly, FIG. 6 is a more detailed diagram of the operation of thesystem according to steps 140, 150, and 250 of FIG. 5. Moreover, it isto be understood that while FIG. 6 describes more detailed movement onedirection i.e., to the right, the same detailed movement can be appliedin any other direction. As illustrated, the determination of whether theuser input corresponds to moving to the right actually involves severaldeterminations. As described in detail below, these determinationsinclude moving to the right through the array 10 at different speeds,moving to the right into a composited additional source output atdifferent speeds, and having the user input overridden by the system100.

The present invention allows a user 22 to navigate through the array 10at the different speeds. Depending on the speed (i.e. number of cameranodes transversed per unit of time) indicated by the user's input, suchas movement of a pointing device (or other interface device), the server18 will apply an algorithm that controls the transition between cameraoutputs either at critical speed (n nodes/per unit of time), undercritical speed (n−1 nodes/per unit of time), or over critical speed (n+1nodes/per unit of time).

It is to be understood that speed of movement through the array 10 canalternatively be expressed as the time to switch from one camera 14 toanother camera 14.

Specifically, as shown in step 140 a, the server 18 makes thedetermination whether the user input corresponds to moving to the rightat a critical speed. The critical speed is preferably a predeterminedspeed of movement through the array 10 set by the system operator ordesigner depending on the anticipated environment being viewed. Further,the critical speed depends upon various other factors, such as focallength, distance between cameras, distance between the cameras and theviewed object, and the like. The speed of movement through the array 10is controlled by the number of cameras 14 traversed in a given timeperiod. Thus, the movement through the array 10 at critical speedcorresponds to traversing some number, “n”, camera nodes permillisecond, or taking some amount of time, “s”, to switch from onecamera 14 to another. It is to be understood that in the same embodimentthe critical speed of moving through the array 10 in one dimension neednot equal the critical speed of moving through the array in anotherdimension. Consequently, the server 18 increments the current nodeaddress along the X axis at n nodes per millisecond.

In the present preferred embodiment the user traverses twenty-fourcameras 14 per second. As discussed above, a movie projector recordstwenty-four frames per second. Analogizing between the movie projectorand the present invention, at critical the user traverses (and theserver 18 switches between) approximately twenty-four cameras 14 persecond, or a camera 14 approximately every 0.04167 seconds.

As shown in FIG. 6, the user 22 may advance not only at critical speed,but also at over the critical speed, as shown in step 140 b, or at underthe critical speed, as shown in step 140 c. Where the user input “I”indicates movement through the array 10 at over the critical speed, theserver 18 increments the current node address along the X axis by a unitof greater than n, for example, at n+2 nodes per millisecond. The stepof incrementing the current node address at n+1 nodes per millisecondalong the X axis is shown in step 150 b. Where the user input “I”indicates movement through the array 10 at under the critical speed, theserver 18 proceeds to increment the current node address at a variableless than n, for example, at n−1 nodes per millisecond. This operationis shown as step 150 c.

Scaleable Arrays

The shape of the array 10 can also be electronically scaled and thesystem 100 designed with a “center of gravity” that will ease a user'simage path back to a “starting” or “critical position” node or ring ofnodes, either when the user 22 releases control or when the system 100is programmed to override the user's autonomy; that is to say, theactive perimeter or geometry of the array 10 can be pre-configured tochange at specified times or intervals in order to corral or focusattention in a situation that requires dramatic shaping. The systemoperator can, by real-time manipulation or via a pre-configuredelectronic proxy sequentially activate or deactivate designated portionsof the camera array 10. This is of particular importance in maintainingauthorship and dramatic pacing in theatrical or entertainment venues,and also for implementing controls over how much freedom a user 22 willhave to navigate through the array 10.

In the present embodiment, the system 100 can be programmed such thatcertain portions of the array 10 are unavailable to the user 22 atspecified times or intervals. Thus, continuing with step 140 d of FIG.6, the server 18 makes the determination whether the user inputcorresponds to movement to the right through the array but is subject toa navigation control algorithm. The navigation control algorithm causesthe server 18 to determine, based upon navigation control factors,whether the user's desired movement is permissible.

More specifically, the navigation control algorithm, which is programmedin the server 18, determines whether the desired movement would causethe current node address to fall outside the permissible range of nodecoordinates. In the present embodiment, the permissible range of nodecoordinates is predetermined and depends upon the time of day, as notedby the server 18. Thus, in the present embodiment, the navigationcontrol factors include time. As will be appreciated by those skilled inthe art, permissible camera nodes and control factors can be correlatedin a table stored in memory.

In an alternate embodiment, the navigation control factors include timeas measured from the beginning of a performance being viewed, also asnoted by the server. In such an embodiment, the system operator candictate from where in the array a user will view certain scenes. Inanother alternate embodiment, the navigation control factor is speed ofmovement through the array. For example, the faster a user 22 moves ornavigates through the array, the wider the turns must be. In otheralternate embodiments, the permissible range of node coordinates is notpredetermined. In one embodiment, the navigation control factors and,therefore, the permissible range, is dynamically controlled by thesystem operator who communicates with the server via an input device.

Having determined that the user input is subject to the navigationcontrol algorithm, the server 18 further proceeds, in step 150 d, toincrement the current node address along a predetermined path. Byincrementing the current node address along a predetermined path, thesystem operator is able to corral or focus the attention of the user 22to the particular view of the permissible cameras 14, therebymaintaining authorship and dramatic pacing in theatrical andentertainment venues.

In an alternate embodiment where the user input is subject to anavigation control algorithm, the server 18 does not move the user alonga predetermined path. Instead, the server 18 merely awaits a permissibleuser input and holds the view at the current node. Only when the server18 receives a user input resulting in a permissible node coordinate willthe server 18 adjust the user's view.

Additional Source Output/Throughput

In addition to moving through the array 10, the user 22 may, atpredetermined locations in the array 10, choose to leave the real worldenvironment being viewed. More specifically, additional source outputs,such as computer graphic imagery, virtual world camera views and virtualworld grid data, virtual world imagery, virtual objects and their gridpositioning data, applets, sprites, avatar representations, film clips,animation, augmented reality objects or images or recordings ofreal-world objects and other artificial and real camera outputs, aremade available to the user 22. In one embodiment, the additional sourceoutput is composited with the view of the real environment. In analternate embodiment, the user's view transfers completely from the realenvironment to that offered by the additional source output.

More specifically, the additional source output is stored (preferably indigital form) in the electronic storage device 20. Upon the user 22inputting a desire to view the additional source output, the server 18transmits the additional source output to the user interface/displaydevice 24. The present embodiment, the server 18 simply transmits theadditional source output to the user display device 24. In an alternateembodiment, the server 18 first composites the additional source outputwith the camera output and then transmits the composited signal to theuser interface/display device 24.

As shown in step 140 e, the server 18 makes the determination whetherthe user input corresponds to moving in the array into the sourceoutput. If the user 22 decides to move (or the system is configured tocause the user 22 to move) into the additional source output, the server18 adjusts the view by, for example, substituting the additional sourceoutput for the updated camera output identified in either of steps 150a-d.

The additional source output may include multiplexed, composited (usingblue screen, green screen, or alpha techniques), or layered output fromthe group of various inputs and/or outputs including: computer graphicimagery, virtual world camera views and virtual world grid data, virtualworld imagery, virtual objects and their grid positioning data, applets,sprites, avatar representations, film clips, animation, augmentedreality objects or images or recordings of real-world objects. Thesystem may present the additional source output, alone or in combinationwith the camera output, for example, by mosaicing, mixing, layering ormultiplexing it.

The additional source output may be aligned and registered with realworld camera views along the user's perspective motion path, as theuser's viewpoint moves from camera to camera. This alignment orregistration of a real world camera view with a virtual world view canbe aided by camera orientation and pointing platforms (such as aspherical ultrasonic motor or a spherical magnetic stepper motor orother such devices on the physical camera array side, and virtual cameranavigation algorithms on the virtual world side) along with protocolhandshakes between the camera array operating system and the virtualworld operating system; and the alignment can be triggered or guided bytransceivers embedded in real world environmental sensors, such as:radio frequency (RFID) tags worn by event actors or embedded in eventobjects, proximity sensors, infrared heat sensors, motion sensors, soundand voice sensors, vibration sensors, and further aided byaccelerometers and/or transceivers riding on the cameras themselves orthe camera support structure. The alignment can also be aided bywindowing and image repositioning within the camera's field of view.

The system may also be configured to permit the convergence of realworld perspective paths with virtual world perspective paths and theseamless continuation of such paths. For example, where the additionalsource output is imagery and data from a virtual world environmentrelevant to a user-guided virtual camera path, such as camera animationalgorithms and data from wire mesh geometries, and where the real-worldperspective motion path (the path that progresses through theoverlapping fields of view of the physical camera array) transitions toa virtual world camera path, the user's path along the real world cameraarray will transition fluidly and seamlessly into a continuing path inthe virtual world, and subsequently along virtual world camera pathsthat are controlled by the user. The motion path transition betweenworlds can be effected by a number of methods, including camerasequencing that employs the same techniques used to move the viewingpath through the physical array (and where the first virtual camera viewis treated as if it were the next physical camera view in the physicalarray). It can also employ other techniques, including multiplexing,layering, or transitional visual effects, e.g., a simple dissolve. Thetransition from the viewing path along the physical array of camerasinto a navigable virtual world camera path is novel and requiressoftware APIs (application interfaces) on both sides of the equation forthe viewing path to be tightly convergent and continuous. The system ofone embodiment identifies some real world feature that could be madecommon to both worlds, a doorway, for example, or some architecturalelement. Since the virtual world is completely malleable, a replica ofthe real world feature common to both worlds could be constructed usingbuilding tools in the virtual environment and based, in one embodiment,on a 3D laser map (or the data from some other metrology tool familiarto those in the art) of the real-world topology; this virtual worldobject or feature would be subsequently scaled, rotated, and alignedwith the feature in the physical array camera views to provide thevisual transition ramp. The method of transition could be as simple as adissolve (or any method that would equate with the camera arraynavigation process) and would be supported by a protocol handshakebetween the camera array operating system and the virtual worldoperating system (for example, an open-source simulation environment).Windowing, digital zooming and repositioning within the physical camerasensors would provide another layer of adjustment and refinements forfinessing the transitions.

In certain embodiments, the system links real and virtual world paths ina continuum that can scale to a “3D Internet”. In such embodiments, theadditional source output may be the imagery and data from a virtualworld camera path, and herein the egress from that virtual world camerapath is a transition to a real-world navigable camera array (or someother interim virtual world path, or any alternative, physical cameraarray path); thus, a continuum is established for making the Internet acontinuously navigable spatial network of paths, based on transitionsbetween the user-guided perspective motion paths along navigable realworld camera arrays and the user-guided camera paths in virtual worlds.In essence, virtual worlds thus become the connecting tissue betweenseparated real world environments (e.g., transitioning from a first realworld camera array and associated environment to a second real worldcamera array and associated environment) and vice versa, and real worldenvironments can be “nested” in the “shells” of virtual worlds and viceversa.

The system programming, including the above-referenced APIs, specifiesand controls the objects and elements in a given environment and allowsthe code to function in specific ways. For example, the functionalinterface in a virtual world environment that allows the user to bring areal world video stream (link) into a virtual world space and map itonto objects or the background plane may be such an API. Suchfunctionalities already exist in virtual worlds including that under thetradename of SECOND LIFE. They enable real video to be displayed (e.g.,mapped) on a virtual object that can be resized in real time and/ormapped on the virtual world's background plane (e.g., so that thevirtual world, from the user's camera perspective (and thuscorresponding camera output) is enveloped by the virtual world. Whereverthe user navigates or whatever direction the user turns or looks, thecamera perspective is the background plane. The reverse mapping is alsowithin the scope of the present invention, namely mapping virtual worlds(or other additional source output) into the real world cameraperspective (output), for example, onto real objects or substituting areal background plane with a virtual one.

In such embodiments, the mapping of an image, or sequence of images, invarious ways on the virtual world plane or on objects in it involvesaligning the digital video imagery from the real world camera path withthe digital imagery from the virtual world camera path, or overlayingone on top of another, to create the illusion that a real world path wasmerging into a virtual camera path and vice versa. Consequently, theuser experiences a continuous motion path, bridging from a real worldenvironment into a virtual world environment (or vice versa) or into acombination real and virtual world environment; that experience ismediated by the system software that analyzes and controls the real andvirtual camera paths on both sides of the transition, making adjustmentsto align and lock or synchronize those paths, and create entry and exit“ramps” that would hide any “stalls” caused by handshakes and protocolsnegotiated between the different domains.

Different techniques may be used to align or overlay the real andvirtual worlds, including those utilized in movie film special effects,depending on what creative tools the system designer decides bestpromotes the effect, including the scaling and aligning of objects orfeatures that are common to both worlds. The latter approach means thatreal world objects are reproduced in the virtual world and scaled andaligned with the object or feature as it appears in the last frame ofthe real world video so that those objects or features are “extended”into the virtual environment. An example would be a hallway or a tunnelor a path with, perhaps, various replicated inanimate objects repeatingfrom the real world environment and continuing into the virtual one tosuggest the extension of the space. Such computer graphics imagery (CGI)special effects may thus build on the last frame of actual film footagewhen transitioning to digital effects.

Although tools and processes for CGI effects are known, as will beappreciated to those skilled in the art, such embodiments of the presentinvention have the benefit of a guided motion path, initiated by theuser, but intermediated by system software, that will bridge betweenreal and virtual worlds by bringing together the user-guided camerapaths on both sides of the transition.

Thus, the system provides the ability to align and lock a navigablevideo user path (that is, a sequence of digital video frames controlledby the user) with a separate individual camera path in a virtual world(a frame sequence in the virtual world environment also controlled bythe user). Such “aligning and locking” may entail establishing specificcorrelations between the apparent motion of a specific real worldnavigation path and the apparent motion of a specific virtual camerapath that moves across the grid of a virtual world.

Notably, in certain embodiments, the overlaid camera perspectives (e.g.,real camera output and virtual camera output), which can be thought ofas on and off ramps between the real and virtual environments, may beheld in buffer or other system memory for efficient retrieval, therebycircumventing the latency that may be encountered by having to “log” orgo through handshake protocols between one environment (server) andanother. The end result would be a continuous navigation experience withno stalls or stops due to negotiating access from one domain intoanother.

It is important to note that if the end user is given options for movingthrough a navigable video camera array (i.e., speed and path directionin approaching a “transition point”) that only one side of the equationwill be known prior to the transition—the characteristics of the user'scamera path before the transition—specifically, how fast the user ismoving through the array, and from what direction the user isapproaching the transition point. Thus, in certain embodiments, thecharacteristics of the camera path (for example, in the virtual world onthe other side of the real-to-virtual world transition) are extrapolatedto match those path characteristics. Such characteristics may includedirection, speed, orientation and other characteristics of the user'snavigation. For example, if the background is moving left to right, orright to left, that orientation and flow ideally is matched andcontinued (at least momentarily through the transition) in the virtualworld camera movement.

The motion along the user path can be characterized, in one embodiment,by the concept of “apparent speed” and/or “apparent direction” (e.g.,relative to the background). Thus, the orientation of the camera fieldof view relative to the direction of the motion path through the localenvironment, and the distance of objects from the camera (for example,whether the camera is perpendicular or parallel to the line of movement,or whether there are near-field objects or far-field objects in view)have a bearing on the perception of speed.

Of course, in simpler embodiments, the user is not given options forsuch characteristics, so by default the real and virtual worldnavigation characteristics may be matched.

If transitioning from a virtual world environment to a real worldenvironment, the adjustments to the user's path through the real cameraarray might be controlled by a system “override” (i.e., overriding theuser inputs), which temporarily commandeers the user's control andautomatically feathers the speed and direction of the user's path andflow through the real-virtual/virtual-real camera transition (and afterthe transition period, cedes control back to the user). If transitioningfrom a real world camera array to a user's individual virtual worldcamera, adjustments to the user's path in the virtual world may beimposed through the transition by an algorithm that temporarily controlsthe camera path and which, subsequent to ushering the user through thetransition, cedes control back to the user to either stop or move thecamera ahead in any direction desired. Control methods include adjustingthe path speed or direction through the camera nodes, varying the framerate, “double printing” (to use an old film term) or duplicating andrepeating camera fields, deleting in-between frames, or setting virtualworld camera animation functions to expand or compress the time line.

This correlation between paths in both worlds could also be driven by a“grid” correspondence, whereby the software enables a visual grid to besuperimposed on the topology of the real world camera arrays andreferenced to various grid schemes in virtual worlds. In thisembodiment, the frequency of grid lines navigated (representingdistances across camera fields of view) and also the movement of in situobjects in the local environment moving across the grid (representingnear-field or far field objects and the direction of movement) providethe data for shaping the virtual world camera path through thetransition and perhaps just beyond it. In the reverse scenario (andsince the virtual world camera paths offer more flexibility than thereal world array paths), the virtual world ramp into the transition willbe shaped according to the path flexibility on the real world array side(that is, whether there is more than one array path diverging from thetransition exit).

Finally, since (in some embodiments) there may be a latency factor inmoving the user experience from one server to another (in logging a userfrom a navigable video server in the “cloud” to another virtual worldserver), APIs in the real word Navigable Video System and APIs insoftware add-ons for virtual world “users” (the GUIs that control theuser experience and give users options for how to view and move throughthe virtual grid) would facilitate the automatic pre-build ofcamera-path “ramps” for each respective side of the transition. Theseramps (based on the path options and specific characteristics of eachlocal world) would enable the expansion or compression of the time ittakes to transverse camera nodes leading up to the transition points.They would be applied relative to the closure of the handshake betweenservers, so that the transition could be a fluid and continuous motionpath and not be interrupted by administrative protocols or Internetcongestion. Thus, when a request is made to navigate between worlds viamarked transition points, a pre-built motion ramp is activated and thesoftware intermediates to control the flow of the transition as itmonitors the progress of any log-in or handshake process behind thescenes.

It should be noted that embodiments of the present invention are notlimited to any particular type of transition or system implementationfor a transition, if a transition is even provided.

Continuing with the process flow, once the current node address isupdated in either of steps 150 a-d, the server 18 proceeds to adjust theuser's view in step 250. When adjusting the view, the server 18 “mixes”the existing or current camera output being displayed with the output ofthe camera 14 identified by the updated camera node address. Mixing theoutputs is achieved differently in alternate embodiments of theinvention. In the present embodiment, mixing the outputs involveselectronically switching at a particular speed from the existing cameraoutput to the output of the camera 14 having the new current nodeaddress (to the additional source output).

It is to be understood that in this and other preferred embodimentsdisclosed herein, the camera outputs are synchronized. As is well knownin the art, a synchronizing signal from a “sync generator” is suppliedto the cameras. The sync generator may take the form of those used invideo editing and may comprise, in alternate embodiments, part of theserver, the hub, and/or a separate component coupled to the array.

As described above, in the current embodiment, at critical speed, theserver 18 switches camera outputs approximately at a rate of 24 persecond, or one every 0.04167 seconds. If the user 22 is moving throughthe array 10 at under the critical speed, the outputs of theintermediate cameras 14 are each displayed for a relatively longerduration than if the user is moving at the critical speed. Similarly,each output is displayed for a relatively shorter duration when a usernavigates at over the critical speed. In other words, the server 18adjusts the switching speed based on the speed of the movement throughthe array 10.

Of course, it is to be understood that in a simplified embodiment of thepresent invention, the user may navigate at only the critical speed.

In another alternate embodiment, mixing the outputs is achieved bycompositing the existing or current output and the updated camera nodeoutput. In yet another embodiment, mixing involves dissolving theexisting view into the new view. In still another alternate embodiment,mixing the outputs includes adjusting the frame refresh rate of the userdisplay device. Additionally, based on speed of movement through thearray, the server may add motion blur to convey the realistic sense ofspeed.

In yet another alternate embodiment, the server causes a black screen tobe viewed instantaneously between camera views. Although not alwaysadvantageous, such black screens reduce the physiologic “carrying over”of one view into a subsequent view.

It is to be understood that the user inputs corresponding to movementsthrough the array at different speeds may include either differentkeystrokes on a keypad, different positions of a joystick, positioning ajoystick in a given position for a predetermined length of time, and thelike. Similarly, the decision to move into an additional source outputmay be indicated by a particular keystroke, joystick movement, or thelike (including optical, infrared, gesture driven, voice-activated,biofeedback-initiated, multi-touch or haptic interface controllers).

In another embodiment, mixing may be accomplished by “mosaicing” theoutputs of the intermediate cameras 14. U.S. Pat. No. 5,649,032 entitledSystem For Automatically Aligning Images To Form A Mosaic Image to PeterJ. Burt et al. discloses a system and method for generating a mosaicfrom a plurality of images and is hereby incorporated by reference. Theserver 18 automatically aligns one camera output to another cameraoutput, a camera output to another mosaic (generated from previouslyoccurring camera output) such that the output can be added to themosaic, or an existing mosaic to a camera output.

Once the mosaic alignment is complete, the present embodiment utilizes amosaic composition process to construct (or update) a mosaic. The mosaiccomposition comprises a selection process and a combination process. Theselection process automatically selects outputs for incorporation intothe mosaic and may include masking and cropping functions to select theregion of interest in a mosaic. Once the selection process selects whichoutput(s) are to be included in the mosaic, the combination processcombines the various outputs to form the mosaic. The combination processapplies various output processing techniques, such as merging, fusing,filtering, output enhancement, and the like, to achieve a seamlesscombination of the outputs. The resulting mosaic is a smooth view thatcombines the constituent outputs such that temporal and spatialinformation redundancy are minimized in the mosaic. In one embodiment ofthe present invention, the mosaic may be formed as the user movesthrough the system (on the fly) and the output image displayed close toreal time. In another embodiment, the system may form the mosaic from apredetermined number of outputs or during a predetermined time interval,and then display the images pursuant to the user's navigation throughthe environment.

In yet another embodiment, the server 18 enables the output to be mixedby a “tweening” process. One example of the tweening process isdisclosed in U.S. Pat. No. 5,529,040 entitled Method For DeterminingSensor Motion And Scene Structure And Image Processing System Thereforto Keith J. Hanna, herein incorporated by reference. Tweening enablesthe server 18 to process the structure of a view from two or more cameraoutputs of the view.

Applying the Hanna patent to the telepresence method/system herein,tweening is now described. The server monitors the movement among theintermediate cameras 14 through a scene using local scenecharacteristics such as brightness derivatives of a pair of cameraoutputs. A global camera output movement constraint is combined with alocal scene characteristic constancy constraint to relate local surfacestructures with the global camera output movement model and local scenecharacteristics. The method for determining a model for global cameraoutput movement through a scene and scene structure model of the scenefrom two or more outputs of the scene at a given image resolutioncomprises the following steps:

-   -   (a) setting initial estimates of local scene models and a global        camera output movement model;    -   (b) determining a new value of one of the models by minimizing        the difference between the measured error in the outputs and the        error predicted by the model;    -   (c) resetting the initial estimates of the local scene models        and the image sensor motion model using the new value of one of        the models determined in step (b);    -   (d) determining a new value of the second of the models using        the estimates of the models determined in step (b) by minimizing        the difference between the measured error in the outputs and the        error predicted by the model;    -   (e) warping one of the outputs towards the other output using        the current estimates of the models at the given image        resolution; and    -   (f) repeating steps (b), (c), (d) and (e) until the differences        between the new values of the models and the values determined        in the previous iteration are less than a certain value or until        a fixed number of iterations have occurred.

It should be noted that where the Hanna patent effectuates the tweeningprocess by detecting the motion of an image sensor (e.g., a videocamera), an embodiment of the present invention monitors the usermovement among live cameras or storage nodes.

In an alternate embodiment, although not always necessary, to ensure aseamless progression of views, the server 18 also transmits to the userdisplay device 24 outputs from some or all of the intermediate cameras,namely those located between the current camera node and the updatedcamera node. Such an embodiment will now be described with reference toFIGS. 7a-7g . Specifically, FIG. 7a illustrates a curvilinear portion ofan array 10 that extends along the X axis or to the left and right fromthe user's perspective. Thus, the coordinates that the server 18associates with the cameras 14 differ only in the X coordinate. Morespecifically, for purposes of the present example, the cameras 14 can beconsidered sequentially numbered, starting with the left-most camera 14being the first, i.e., number “1”. The X coordinate of each camera 14 isequal to the camera's position in the array. For illustrative purposes,particular cameras will be designate 14-X, where X equals the camera'sposition in the array 10 and, thus, its associated X coordinate.

In general, FIGS. 7a-7g illustrate possible user movement through thearray 10. The environment to be viewed includes three objects 602, 604,606, the first and second of which include numbered surfaces. As will beapparent, these numbered surface allow a better appreciation of thechange in user perspective.

In FIG. 7a , six cameras 14-2, 14-7, 14-11, 14-14, 14-20, 14-23 of thearray 10 are specifically identified. The boundaries of each camera'sview is identified by the pair of lines 14-2 a, 14-7 a, 14-11 a, 14-14a, 14-20 a, 14-23 a, radiating from each identified camera 14-2, 14-7,14-11, 14-14, 14-20, 14-23, respectively. As described below, in thepresent example the user 22 navigates through the array 10 along the Xaxis such that the images or views of the environment are thosecorresponding to the identified cameras 14-2, 14-7, 14-11, 14-14, 14-20,14-23.

The present example provides the user 22 with the starting view fromcamera 14-2. This view is illustrated in FIG. 7b . The user 22, desiringto have a better view of the object 702, pushes the “7” key on thekeyboard. This user input is transmitted to and interpreted by theserver 18.

Because the server 18 has been programmed to recognized the “7” key ascorresponding to moving or jumping through the array to camera 14-7. Theserver 18 changes the X coordinate of the current camera node address to7, selects the output of camera 14-7, and adjusts the view or image sentto the user 22. Adjusting the view, as discussed above, involves mixingthe outputs of the current and updated camera nodes. Mixing the outputs,in turn, involves switching intermediate camera outputs into the view toachieve the seamless progression of the discrete views of cameras 14-2through 14-7, which gives the user 22 the look and feel of moving aroundthe viewed object. The user 22 now has another view of the first object702. The view from camera 14-7 is shown in FIG. 7c . As noted above, ifthe jump in camera nodes is greater than a predetermined limit, theserver 18 would omit some or all of the intermediate outputs.

Pressing the “right arrow” key on the keyboard, the user 22 indicates tothe system 100 a desire to navigate to the right at critical speed. Theserver 18 receives and interprets this user input as indicating such andincrements the current camera node address by n=4. Consequently, theupdated camera node address is 14-11. The server 18 causes the mixing ofthe output of camera 14-11 with that of camera 14-7. Again, thisincludes switching into the view the outputs of the intermediate cameras(i.e., 14-8, 14-9, and 14-10) to give the user 22 the look and feel ofnavigating around the viewed object. The user 22 is thus presented withthe view from camera 14-11, as shown in FIG. 7 d.

Still interested in the first object 702, the user 22 enters a userinput, for example, “alt-right arrow,” indicating a desire to move tothe right at less than critical speed. Accordingly, the server 18increments the updated camera node address by n−1 nodes, namely 3 in thepresent example, to camera 14-14. The outputs from cameras 14-11 and14-14 are mixed, and the user 22 is presented with a seamless viewassociated with cameras 14-11 through 14-14. FIG. 7e illustrates theresulting view of camera 14-14.

With little to see immediately after the first object 702, the user 22enters a user input such as “shift-right arrow,” indicating a desire tomove quickly through the array 10, i.e., at over the critical speed. Theserver 18 interprets the user input and increments the current nodeaddress by n+2, or 6 in the present example. The updated node addressthus corresponds to camera 14-20. The server 18 mixes the outputs ofcameras 14-14 and 14-20, which includes switching into the view theoutputs of the intermediate cameras 14-15 through 14-19. The resultingview of camera 14-20 is displayed to the user 22. As shown in FIG. 7f ,the user 22 now views the second object 704.

Becoming interested in the third object 704, the user 22 desires to moveslowly through the array 10. Accordingly, the user 22 enters “alt-rightarrow” to indicate moving to the right at below critical speed. Once theserver 18 interprets the received user input, it updates the currentcamera node address along the X axis by 3 to camera 14-23. The server 18then mixes the outputs of camera 14-20 and 14-23, thereby providing theuser 22 with a seamless progression of views through camera 14-23. Theresulting view 14-23 a is illustrated in FIG. 7 g.

Other Data Devices

It is to be understood that devices other than cameras may beinterspersed in the array. These other devices, such as RFID, motioncapture cameras, reflective devices, make-up and systems, motion sensorsand microphones, provide data to the server(s) for processing. Forexample, in alternate embodiments output from motion sensors ormicrophones are fed to the server(s) and used to scale the array. Morespecifically, permissible camera nodes (as defined in a table stored inmemory) are those near the sensor or microphone having a desired outpute.g., where there is motion or sound. As such, navigation controlfactors include output from other such devices. Alternatively, theoutput from the sensors or microphones are provided to the user.Furthermore, data received from any of such other data devices may beused as a trigger to transition to or otherwise provide additionalsource output to users or may be used in conjunction with the additionalsource output (e.g., using real world sounds captured from microphonesin conjunction with a virtual world depiction).

An alternate embodiment in which the array of cameras includes multiplemicrophones interspersed among the viewed environment and the cameraswill now be described with reference to FIG. 8. The system 800 generallyincludes an array of cameras 802 coupled to a server 804, which, inturn, is coupled to one or more user interface and display devices 806and an electronic storage device 808. A hub 810 collects and transfersthe outputs from the array 802 to the server 804. More specifically, thearray 802 comprises modular rails 812 that are interconnected. Each rail812 carries multiple microcameras 814 and a microphone 816 centrallylocated at rail 812. Additionally, the system 800 includes microphones818 that are physically separate from the array 802. The outputs of boththe cameras 814 and microphones 816, 818 are coupled to the server 804for processing.

In general, operation of the system 800 proceeds as described withrespect to system 100 of FIGS. 1-2 d and 5-6. Beyond the operation ofthe previously described system 100, however, the server 804 receivesthe sound output from the microphones 816, 818 and, as with the cameraoutput, selectively transmits sound output to the user. As the server804 updates the current camera node address and changes the user's view,it also changes the sound output transmitted to the user. In the presentembodiment, the server 804 has stored in memory an associated range ofcamera nodes with a given microphone, namely the cameras 814 on eachrail 810 are associated with the microphone 816 on that particular rail810. In the event a user attempts to navigate beyond the end of thearray 802, the server 804 determines the camera navigation isimpermissible and instead updates the microphone node output to that ofthe microphone 818 adjacent to the array 802.

In an alternate embodiment, the server 804 might include a database inwhich camera nodes in a particular area are associated with a givenmicrophones. For example, a rectangle defined by the (X, Y, Z)coordinates (0,0,0), (10,0,0), (10,5,0), (0,5,0), (0,0,5), (10,0,5),(10,5,5) and (0,5,5) are associated with a given microphone. It is to beunderstood that selecting one of the series of microphones based on theuser's position (or view) in the array provides the user with a soundperspective of the environment that coincides with the visualperspective.

It is to be understood that the server of the embodiments discussedabove may take any of a number of known configurations. Two examples ofserver configurations suitable for use with the present invention willbe described with reference to FIGS. 9 and 10. Turning first to FIG. 9,the server 902, electronic storage device 20, array 10, users (1, 2, 3,. . . N) 22-1-22-N, and associated user interface/display devices24-1-24-N are shown therein.

The server 902 includes, among other components, a processing means inthe form of one or more central processing units (CPU) 904 coupled toassociated read only memory (ROM) 906 and a random access memory (RAM)908. In general, ROM 906 is for storing the program that dictates theoperation of the server 902, and the RAM 908 is for storing variablesand values used by the CPU 904 during operation. Also coupled to the CPU904 are the user interface/display devices 24. It is to be understoodthat the CPU may, in alternate embodiments, comprise several processingunits, each performing a discrete function.

Coupled to both the CPU 904 and the electronic storage device 20 is amemory controller 910. The memory controller 910, under direction of theCPU 904, controls accesses (reads and writes) to the storage device 20.Although the memory controller 910 is shown as part of the server 902,it is to be understood that it may reside in the storage device 20.

During operation, the CPU 904 receives camera outputs from the array 10via bus 912. As described above, the CPU 904 mixes the camera outputsfor display on the user interface/display device 24. Which outputs aremixed depends on the view selected by each user 22. Specifically, eachuser interface/display devices 24 transmits across bus 914 the userinputs that define the view to be displayed. Once the CPU 904 mixes theappropriate outputs, it transmits the resulting output to the userinterface/display device 24 via bus 916. As shown, in the presentembodiment, each user 22 is independently coupled to the server 902.

The bus 912 also carries the camera outputs to the storage device 20 forstorage. When storing the camera outputs, the CPU 904 directs the memorycontroller 910 to store the output of each camera 14 in a particularlocation of memory in the storage device 20.

When the image to be displayed has previously been stored in the storagedevice 20, the CPU 904 causes the memory controller 910 to access thestorage device 20 to retrieve the appropriate camera output. The outputis thus transmitted to the CPU 904 via bus 918 where it is mixed. Bus918 also carries additional source output to the CPU 904 fortransmission to the users 22. As with outputs received directly from thearray 10, the CPU 904 mixes these outputs and transmits the appropriateview to the user interface/display device 24.

FIG. 10 shows a server configuration according to an alternateembodiment of the present invention. As shown therein, the server 1002generally comprises a control central processing unit (CPU) 1004, amixing CPU 1006 associated with each user 22, and a memory controller1008. The control CPU 1004 has associated ROM 1010 and RAM 1012.Similarly, each mixing CPU 1006 has associated ROM 1014 and RAM 1016.

To achieve the functionality described above, the camera outputs fromthe array 10 are coupled to each of the mixing CPUs 1 through N 1006-1,1006-N via bus 1018. During operation, each user 22 enters inputs in theinterface/display device 24 for transmission (via bus 1020) to thecontrol CPU 1004. The control CPU 1004 interprets the inputs and, viabuses 1022-1, 1022-N, transmits control signals to the mixing CPUs1006-1, 1006-N instructing them which camera outputs received on bus1018 to mix. As the name implies, the mixing CPUs 1006-1, 1006-N mix theoutputs in order to generate the appropriate view and transmit theresulting view via buses 1024-1, 1024-N to the user interface/displaydevices 24-1, 24-N.

In an alternate related embodiment, each mixing CPU 1006 multiplexesoutputs to more than one user 22. Indications of which outputs are tomixed and transmitted to each user 22 comes from the control CPU 1004.

The bus 1018 couples the camera outputs not only to the mixing CPUs1006-1, 1006-N, but also to the storage device 20. Under control of thememory controller 1008, which in turn is controlled by the control CPU1004, the storage device 20 stores the camera outputs in known storagelocations. Where user inputs to the control CPU 1004 indicate a users'22 desire to view stored images, the control CPU 1004 causes the memorycontroller 1008 to retrieve the appropriate images from the storagedevice 20. Such images are retrieved into the mixing CPUs 1006 via bus1026. Additional source output is also retrieved to the mixing CPUs1006-1, 1006-N via bus 1026. The control CPU 1004 also passes controlsignals to the mixing CPUs 1006-1, 1006-N to indicate which outputs areto be mixed and displayed.

Stereoscopic Views

It is to be understood that it is within the scope of the presentinvention to employ stereoscopic views of the environment. To achievethe stereoscopic view, the system retrieves from the array (or theelectronic storage device) and simultaneously transmits to the user atleast portions of outputs from two cameras. The server processingelement mixes these camera outputs to achieve a stereoscopic output.Each view provided to the user is based on such a stereoscopic output.In one stereoscopic embodiment, the outputs from two adjacent cameras inthe array are used to produce one stereoscopic view. Using the notationof FIGS. 7a-7g , one view is the stereoscopic view from cameras 14-1 and14-2. The next view is based on the stereoscopic output of cameras 14-2and 14-3 or two other cameras. Thus, in such an embodiment, the user isprovided the added feature of a stereoscopic seamless view of theenvironment.

Multiple Users

As described above, the present invention allows multiple users tosimultaneously navigate through the array independently of each other.To accommodate multiple users, the systems described above distinguishbetween inputs from the multiple users and selects a separate cameraoutput appropriate to each user's inputs. In one such embodiment, theserver tracks the current camera node address associated with each userby storing each node address in a particular memory location associatewith that user. Similarly, each user's input is differentiated andidentified as being associated with the particular memory location withthe use of message tags appended to the user inputs by the correspondinguser interface device.

In an alternate embodiment, two or more users may choose to be linked,thereby moving in tandem and having the same view of the environment. Insuch an embodiment, each includes identifying another user by his/hercode to serve as a “guide”. In operation, the server provides theoutputs and views selected by the guide user to both the guide and theother user selecting the guide. Another user input causes the server tounlink the users, thereby allowing each user to control his/her ownmovement through the array.

Multiple Arrays

In certain applications, a user may also wish to navigate forward andbackward through the environment, thereby moving closer to or furtheraway from an object. Although it is within the scope of the presentinvention to use cameras with zoom capability, simply zooming towards anobject does not change the user's image point perspective. One suchembodiment in which users can move dimensionally forward and backwardthrough the environment with a changing image point perspective will nowbe described with respect to FIG. 11 and continuing reference to FIG. 1.As will be understood by those skilled in the art, the arrays describedwith reference to FIG. 11 may be used with any server, storage deviceand user terminals described herein.

FIG. 11 illustrates a top plan view of another embodiment enabling theuser to move left, right, up, down, forward or backwards through theenvironment. A plurality of cylindrical arrays (121-1-121-n) ofdiffering diameters comprising a series of cameras 14 may be situatedaround an environment comprising one or more objects 1200, onecylindrical array at a time. Cameras 14 situated around the object(s)1100 are positioned along an X and Z coordinate system. Accordingly, anarray 12 may comprise a plurality of rings of the same circumferencepositioned at different positions (heights) throughout the z-axis toform a cylinder of cameras 14 around the object(s) 1100. This alsoallows each camera in each array 12 to have an associated, uniquestorage node address comprising an X and Z coordinate—i.e., array₁(X,Z). In the present embodiment, for example, a coordinate valuecorresponding to an axis of a particular camera represents the number ofcamera positions along that axis the particular camera is displaced froma reference camera. In the present embodiment, from the user'sperspective, the X axis runs around the perimeter of an array 12, andthe Z axis runs down and up. Each storage node is associated with acamera view identified by its X, Z coordinate.

As described above, the outputs of the cameras 14 are coupled to one ormore servers for gathering and transmitting the outputs to the server18.

In one embodiment, because the environment is static, each camerarequires only one storage location. The camera output may be stored in alogical arrangement, such as a matrix of n arrays, wherein each arrayhas a plurality of (X,Z) coordinates. In one embodiment, the nodeaddresses may comprise of a specific coordinate within an array—i.e.,Array₁(X_(n),Z_(n)), Array₂(X_(n),Z_(n)) through Array_(n)(X_(n),Z_(n)).As described below, users can navigate the stored images in much thesame manner as the user may navigate through an environment using livecamera images.

The general operation of one embodiment of inputting images in storagedevice 20 for transmission to a user will now be described withreference to FIG. 12 and continuing reference to FIG. 11. As shown instep 1210, a cylindrical array 12-1 is situated around the object(s)located in an environment 1100. The view of each camera 14 istransmitted to server 18 in step 1220. Next, in step 1220, theelectronic storage device 20 of the server 18 stores the output of eachcamera 14 at the storage node address associated with that camera 14.Storage of the images may be effectuated serially, from one camera 14 ata time within the array 12, or by simultaneous transmission of the imagedata from all of the cameras 14 of each array 12. Once the output foreach camera 14 of array 12-1 is stored, cylindrical array 12-1 isremoved from the environment (step 1240). In step 1250, a determinationis made as to the availability of additional cylindrical arrays 12 ofdiffering diameters to those already situated. If additional cylindricalarrays 12 are desired, the process repeats beginning with step 1210.When no additional arrays 12 are available for situating around theenvironment, the process of inputting images into storage devices 20 iscomplete (step 1260). At the end of the process, a matrix of addressablestored images exist.

Upon storing all of the outputs associated with the arrays 12-1 through12-n, a user may navigate through the environment. Navigation iseffectuated by accessing the input of the storage nodes by a userinterface device 24. In the present embodiment, the user inputsgenerally include moving around the environment or object 1100 by movingto the left or right, moving higher or lower along the z-axis, movingthrough the environment closer or further from the object 1100, or somecombination of moving around and through the environment. For example, auser may access the image stored in the node address Array₃(0,0) to viewan object from the camera previously located at coordinate (0,0) ofArray₃. The user may move directly forward, and therefore closer to theobject 1100, by accessing the image stored in Array₂(0,0) and thenArray₁(0,0). To move further away from the object and to the right andup, the user may move from the image stored in node address Array₁(0,0)and access the images stored in node address Array₂(1,1), followed byaccessing the image stored in node address Array₃(2,2), an so on. A usermay, of course, move among arrays and/or coordinates by any incrementschanging the point perspective of the environment with each node.Additionally, a user may jump to a particular camera view of theenvironment. Thus, a user may move throughout the environment in amanner similar to that described above with respect to accessing outputof live cameras. This embodiment, however, allows user to access imagesthat are stored in storage nodes as opposed to accessing live cameras.Moreover, this embodiment provides a convenient system and method toallow a user to move forward and backward in an environment.

It should be noted that although each storage node is associated with acamera view identified by its X, Z coordinate of a particular array,other methods of identifying camera views and storage nodes can be used.For example, other coordinate systems, such as those noting angulardisplacement from a fixed reference point as well as coordinate systemsthat indicate relative displacement from the current camera node may beused. It should also be understood that the camera arrays 12 may beother shapes other than cylindrical. Moreover, it is not essential,although often advantageous, that the camera arrays 12 surround theentire environment.

It is to be understood that the foregoing user inputs, namely, moveclockwise, move counter-clockwise, up, down, closer to the environment,and further from the environment, are merely general descriptions ofmovement through the environment. Although the present invention is notso limited, in the present preferred embodiment, movement in each ofthese general directions is further defined based upon the user input.Moreover the output generated by the server to the user may be mixedwhen moving among adjacent storage nodes associated with environmentviews (along the x axis, z axis, or among juxtaposed arrays) to generateseamless movement throughout the environment. Mixing may be accomplishedby, but are not limited to, the processes described above.

Embodiments Covered

Although the present invention has been described in terms of certainpreferred embodiments, other embodiments that are apparent to those ofordinary skill in the art are also intended to be within the scope ofthis invention. Accordingly, the scope of the present invention isintended to be limited only by the claims appended hereto.

What is claimed is: 1-30. (canceled)
 31. A system for providing a firstuser with seamless viewing of an environment, the system comprising: anarray of cameras each having a progressively different point perspectiveand a field of view of the environment that overlaps that of adjacentcameras, the array of cameras for capturing electronic imagery of theenvironment; one or more electronic storage devices for storingelectronic imagery; a first user interface device having inputs forselecting at least a first view through at least a portion of the arrayfrom which to view the environment; one or more processing elementsconfigured in accordance with computer programming to: tween electronicimagery of progressively different perspectives of the environment usinglocal scene characteristics of the environment; and based on first userinputs from the first user interface device, provide to the first userinterface device tweened imagery along the first view, thereby allowingthe first user to obtain a seamless view through the environment. 32.The system of claim 31, wherein the first user interface includes afirst user display device for providing a first user with the tweenedimagery the environment in response to the first user inputs.
 33. Thesystem of claim 31, wherein the one or more processing elements includea processing element on the first user interface device.
 34. The systemof claim 31, further comprising a second user interface device havingsecond user inputs for selecting at least a second view through at leasta portion of the array, and wherein the one or more processing elementsare further configured to, based on the second user inputs, provide tothe user interface device tweened imagery along the second view, therebyallowing the first user and the second user to navigate simultaneouslyand independently through the tweened images of the environment.
 35. Thesystem of claim 31, wherein the first user interface device is providedstereoscopic imagery obtained from a plurality of cameras in the array.36. The system of claim 31, wherein the first user interface deviceincludes a first display device and the second user interface deviceincludes a second display device, and wherein the first and secondinterface devices and first and second display devices are differenttypes of devices.
 37. The system of claim 31, wherein the one or moreprocessing elements are configured to receive the first user inputs fromthe first user interface device, via a first communication link, andreceive the second user inputs from the second user interface device,via a second communication link, wherein the first communication link isa different type than the second communication link.
 38. The system ofclaim 31, wherein the one or more electronic storage devices storeoutput of a data device other than a camera, and the one or moreprocessing elements are further configured to provide tweened imagery tothe first user interface device based on both first user inputs and theoutput of the data device.
 39. A method of providing at least a firstuser with at least a first view through electronic imagery of anenvironment captured from an array of cameras, each camera having aprogressively different point perspective and a field of view of theenvironment that overlaps that of adjacent cameras, the methodcomprising: receiving, from a first user interface device associatedwith the first user, first user inputs associated with viewing theenvironment along the first view; generating tweened imagery fromelectronic imagery of progressively different perspectives of theenvironment from the cameras, based on local scene characteristics ofthe environment; based on the first user inputs associated with viewingthe environment along the first view through the environment, causingdisplay at the first user interface device tweened imagery ofprogressively different perspectives along the first view, therebyallowing a first user to navigate along the first view of theenvironment.
 40. The method of claim 39, wherein the first userinterface includes a first user display device for providing a firstuser with a display of the environment in response to first user inputs.41. The method of claim 39, wherein the generating tweened imageryincludes one or more processing elements configured in accordance withcomputer programming generating the tweened imagery.
 42. The method ofclaim 39, wherein stereoscopic imagery obtained from a plurality ofcameras in the array is displayed at the first user interface device.43. The method of claim 39, wherein the method further comprises:receiving, from a second user interface device associated with a seconduser, second user inputs associated with viewing the environment along asecond view through at least a portion of the array; and based on thesecond user inputs associated with viewing the environment along thesecond view through the environment, providing to the second userinterface device tweened imagery, thereby allowing the first user andsecond user to navigate simultaneously and independently through thetweened imagery of the environment.
 44. The method of claim 43, whereinthe first user interface device includes a first display device and thesecond user interface device includes a second display device andwherein the first and second interface devices and first and seconddisplay devices are different types of devices.
 45. The method of claim43, wherein the first user inputs are received from the first userinterface device via a first communication link, and the second userinputs are received from the second user interface device via a secondcommunication link, wherein the first communication link is a differenttype than the second communication link.
 46. The method of claim 39,wherein the method comprises storing the tweened images in electronicstorage.
 47. The method of claim 39, further comprising providing thetweened imagery to the first user interface device based on both firstuser inputs and output of a data device other than a camera.
 48. Anon-transient computer readable medium having computer programmingthereon, the computer programming operable to configure one or moreprocessing elements to: generate tweened imagery from electronic imagerycaptured from an array of cameras, wherein each camera has aprogressively different point perspective and a field of view of anenvironment that overlaps that of adjacent cameras; provide tweenedimagery of at least a first view of the environment to a first userdisplay, the tweened imagery of the first view being provided based onfirst user inputs associated with navigating along the first view. 49.The computer readable medium of claim 48, wherein the computerprogramming is further operable to configure the one or more processingelements to retrieve electronic imagery from one or more electronicstorage devices and generate the tweened imagery from the retrievedelectronic imagery.
 50. The computer readable medium of claim 48,wherein the computer programming is further operable to configure theone or more processing elements to select tweened imagery based on thefirst user inputs.
 51. The computer readable medium of claim 48, whereinthe computer programming is further operable to configure the one ormore processing elements to provide stereoscopic imagery to the firstuser display, the stereoscopic imagery generated from multiple camerasin the array.
 52. The computer readable medium of claim 48, wherein thecomputer programming is further operable to configure the one or moreprocessing elements to provide tweened imagery of at least a second viewof the environment to a second user display, the tweened imagery of thesecond view being provided based on second user inputs associated withnavigating along the second view.
 53. The computer readable medium ofclaim 48, wherein the first user display is part of a first userinterface device and the one or more processing elements include aprocessing element at the first user interface device.